class: center, middle, inverse, title-slide .title[ # Lecture 23 ] .subtitle[ ## MLR with Categorical Variables: Interactions ] .author[ ### Psych 10 C ] .institute[ ### University of California, Irvine ] .date[ ### 05/25/2022 ] --- ## Review - Last class we talked about how to include discrete variables into a multiple linear regression model. -- - The models that we tested assumed that there was no interaction between our continuous random variable and a discrete random variable. -- - In other words, the effect of age on blood pressure was the same regardless of the group that a participant belonged to (male or female). -- - However, we could also consider interactions between discrete (categorical) variables and continuous variables. -- - Today we will work with an example that we talked about at the start of the quarter. A recognition memory experiment. --- ## Study - We are interested in the effect of the time elapsed since the study of a list of words and the age of a participant on recognition memory. -- - We have data from an experiment where participants first studied a list of 100 words. -- - After the study session each participant had to respond to a recognition memory task. -- - The time between study and test was random for each participant. -- - The dependent variable in this study was the number of correctly recognized words during the test phase. -- - We have two independent variables in the study. The time elapsed between study and test phase, and the age of the participant in years. --- ## Summary of variables in the study. - The average number of correctly recognized words was equal to 81.7, with a range between 55 and 100 words. -- - The time elapsed between study and test in minutes was 29.48 on average with a range between 20 and 50 minutes. -- - Finally, participants had a mean age of 42.2 with a range between 20 and 64 years. -- - Now before we move forward with the models let's look at the distributions of each variable. --- ## Number of correctly recognized words <img src="data:image/png;base64,#lec-23_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ## Time between study and test <img src="data:image/png;base64,#lec-23_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Participant's age .pull-left[ <img src="data:image/png;base64,#lec-23_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="data:image/png;base64,#lec-23_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] -- - We should always try to graph our data before doing any data analysis. -- - As we can see in the graph, the average participant age makes no sense, as it is just a value that we never even observed in our sample. -- - In such cases is always better to see if our variables are grouped before reporting summary statistics. --- ## Reporting summary statistics by group. - In this case, we know that there are two groups of ages in the experiment, that means that we can report a separate summary statistic for each. -- - We could report something like this: -- - We consider two populations in the study, young and elderly participants. -- - The average age of young participants was 27.3 and the range was between 20 and 34 years. -- - On the other hand, the average age of participants that belong to the elder population was 57.1 and the range was between 50 and 64 years. -- - Now that we know that we have two different groups of participants we need to decide how we are going to include the variable "age" in our models. -- - There is no perfect way to deal with this problem, each method (using age as a continuous or discrete variable) will have it's advantages and disadvantages. --- ## From continuous to discrete variables - As we did last week, we will choose to transform our continuous measure of age into a categorical variable that takes the value 0 for young participants and takes the value 1 for elders. -- - This will make us loose some information but will allow us to test for an interaction effect between **age group** and time elapsed from study. -- - To create our new categorical variable we can use the following code: -- ```r memory <- memory %>% mutate("age_group" = case_when(age <= 40 ~ 0, age > 40 ~ 1)) ``` -- - This code will add a new column to our data with the name "age_group" that we can now use in our models. --- ## Interaction models in MLR - The first model that we need as always is the null model, which assumes that the expected number of words correctly recognized by a participant is constant regardless of time elapsed since study or the age group that a participant belongs to. -- - The second model is the simple linear regression that only includes the age group of a participant as a predictor of the expected number of words.